Portfolio Notes

Portfolio Check

Module 01 Portfolio Content

  • Evidence worksheet_01
    • Completion status: X
    • Comments:
  • Evidence worksheet_02
    • Completion status: X
    • Comments:
  • Evidence worksheet_03
    • Completion status: X
    • Comments:
  • Problem Set_01
    • Completion status: X
    • Comments:
  • Problem Set_02
    • Completion status: X
    • Comments:
  • Writing assessment_01
    • Completion status: X
    • Comments:
  • Additional Readings
    • Completion status: X
    • Comments

Data Science

  • Installation check
    • Completion status: X
    • Comments:
  • Portfolio repo setup
    • Completion status: X
    • Comments:
  • RMarkdown Pretty PDF Challenge
    • Completion status: X
    • Comments: I would pat that dog.
  • ggplot
    • Completion status: 9+0.5/10
    • Comments:

Module 02 Portfolio Content

  • Evidence worksheet_04
    • Completion status: X
    • Comments:
  • Problem Set_03
    • Completion status: X
    • Comments:
  • Writing assessment_02
    • CANCELED
  • Additional Readings
    • Completion status: X
    • Comments

Module 03 Portfolio Content

  • Evidence worksheet_05
    • Completion status: X
    • Comments:
  • Problem set_04
    • Completion status: X
    • Comments:
  • Writing Assessment_03
    • Completion status:
    • Comments:
  • Additional Readings
    • Completion status:
    • Comments

Project 1

  • CATME account setup and survey
    • Completion status: X
    • Comments:
  • CATME interim group assessment
    • Completion status: X
    • Comments:
  • Project 1
    • Report (80%): Looking good!
    • Participation (20%):

Module 04 Portfolio Content

Project 2

  • CATME final group assessment
    • Completion status:
    • Comments:
  • Project 2
    • Report (80%):
    • Participation (20%):

Module 01

Data science Friday

Installation check

Portfolio Repo Setup

Detail the code you used to create, initialize, and push your portfolio repo to GitHub. This will be helpful as you will need to repeat many of these steps to update your porfolio throughout the course.

git config –global user.name “Jonah Lin”

git config – global user.email “1jonahlin1@gmail.com

… Set up MICB425_Materials folder in relevant place …

mkdir MICB425_Portfolio

cd MICB425_Portfolio

git init

git add .

git commit -m “State commit message here”

git remote add origin git@github.com:IStrykerI/MICB425_Portfolio.git

git remote -v

git push -u origin master

… Needed key to get to this repo since it’s locked. Regular submit codes below …

git add .

git commit -m “State commit message here”

git push

Plotting Data in R

# Set-up
# install.packages("tidyverse")
library(tidyverse) 
## -- Attaching packages --------------------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 2.2.1     v purrr   0.2.4
## v tibble  1.4.2     v dplyr   0.7.4
## v tidyr   0.8.0     v stringr 1.3.0
## v readr   1.1.1     v forcats 0.3.0
## -- Conflicts ------------------------------------------------------------------------------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
metadata <- read.table(file="Data/Saanich.metadata.txt", header=TRUE, row.names=1, sep="\t", na.strings=c("NAN", "NA", "."))
OTUdata <- read.table(file="Data/Saanich.OTU.txt", header=TRUE, row.names=1, sep="\t", na.strings=c("NAN", "NA", "."))
# source("https://bioconductor.org/biocLite.R")  
# biocLite("phyloseq")  
library(phyloseq)  
load("Data/phyloseq_object.RData")
physeq_percent = transform_sample_counts(physeq, function(x) 100 * x/sum(x))

# Exercise 1
# Plot of NH4 with purple triangles
ggplot(metadata, aes(x=NH4_uM, y=Depth_m)) +
  geom_point(color="Purple", shape=17)

# Exercise 2
# Convert Celsius to Fahrenheit and create dot plot of temperature in Fahrenheit against depth
Fahr_Data = metadata %>% mutate(Temperature_F = (Temperature_C*9/5) + 32) %>% select(Temperature_F, Depth_m)

ggplot(Fahr_Data, aes(x=Temperature_F, y=Depth_m)) + geom_point()

# Exercise 3
# Title addition with more descriptive x and y axis labels
plot_bar(physeq_percent, fill="Class") + 
  geom_bar(aes(fill=Class), stat="identity") + ggtitle("Classes from 10 to 200 m in Saanich Inlet") + xlab("Sample Depth") + ylab("Percent Relative Abundance") + theme(plot.title = element_text(size = 6))

# Exercise 4
# Select nutrient concentrations
Nutrient_Concentrations = metadata %>% select(Depth_m, O2_uM, PO4_uM, SiO2_uM, NO3_uM, NH4_uM, NO2_uM)

# Collapse all nutrient concentrations into depths
Nutrient_Depths = gather(Nutrient_Concentrations, "Nutrients", "uM", -1)

# Plot faceted figure of all nutrient concentrations
ggplot(Nutrient_Depths, aes(x=Depth_m, y=uM)) + geom_point() + geom_line() + facet_wrap(~Nutrients, scales="free_y")

RMarkdown Pretty PDF Challenge

Paste your code from the in-class activity of recreating the example PDF.

R Markdown PDF Challenge

The following assignment is an exercise for the reproduction of this .html document using the RStudio and RMarkdown tools we’ve shown you in class. Hopefully by the end of this, you won’t feel at all the way this poor PhD student does. We’re here to help, and when it comes to R, the internet is a really valuable resource. This open-source program has all kinds of tutorials online.

http://phdcomics.com/ Comic posted 4-25-2018

Challenge Goals

The goal of this R Markdown html challenge is to give you an opportunity to play with a bunch of different RMarkdown formatting. Consider it a chance to flex your RMarkdown muscles. Your goal is to write your own RMarkdown that rebuilds this html document as close to the original as possible. So, yes, this means you get to copy my irreverant tone exactly in your own Markdowns. It’s a little window into my psyche. Enjoy =)

Hint: Go to the PhD Comics Website to see if you can find the image above.
If you can’t find that exact image, just find a comparable image from the PhD Comics website and include it in your markdown.

Here’s a Header!

Let’s be honest, this header is a little arbitrary. But show me that you can reproduce headers with different levels please. This is a level 3 header, for your reference (You can most easily tell this from the table of contents).

Another header, now with maths

Perhaps you’re already really confused by the whole markdown thing. Maybe you’re so confused that you’ve forgotton how to add. Never fear! A calculator R is here:

1231521+12341556280987
## [1] 1.234156e+13

Table Time

Or maybe, after you’ve added those numbers, you feel like it’s about time for a table!
I’m going to leave all the guts of the coding here so you can see how libraries (R packages) are loaded into R (More on that later). It’s not terribly pretty, but it hints at how R works and how you will use it in the future. The summary function used below is a nice data exploration function that you may use in thefuture.

library(knitr)
kable(summary(cars),caption="I made this table with kable in the knitr package library")
I made this table with kable in the knitr package library
speed dist
Min. : 4.0 Min. : 2.00
1st Qu.:12.0 1st Qu.: 26.00
Median :15.0 Median : 36.00
Mean :15.4 Mean : 42.98
3rd Qu.:19.0 3rd Qu.: 56.00
Max. :25.0 Max. :120.00

And now you’ve almost finished your first RMarkdown! Feeling excited? We are! In fact, we’re so excited that maybe we need a big finale eh? Here’s ours! Include a fun GIF of your choice!

Origins and Earth Systems

Evidence Worksheet_01 “Prokaryotes: The Unseen Majority”

Learning Objectives

Describe the numerical abundance of microbial life in relation to ecology and biogeochemistry of Earth systems.

General Questions

  • What were the main Questions being asked?
    Main Questions:
    • What were the actual number of prokaryotes?
    • What was the total amount of their total cellular C on Earth?
  • What were the primary methodological approaches used?
    Primary Methodological Approaches: Usage of various papers for estimating the number of prokaryotes in various habitats, total C content (Mainly), turnover times, and cellular production rates

  • Summarize the main results or findings.
    Summary of Main Results/Findings:
    • Total C of prokaryotes on earth is approx. 60-100% of total C found in plants
    • Prokaryotes also contain large amounts of N/P/Other essiential nutrients
    • Turnover rates are higher for surface prokaryotes than subsurface prokaryotes
    • Highest cellular productivity is found in open ocean (More mutations/rare genetic events are likely to occur)
    • Mutations = Major source of genetic diversity and essential to formation of novel species
  • Do new Questions arise from the results?
    New Questions:
    • More information is needed for subsurface prokaryotes
    • More detailed knowledge is needed regarding prokaryotic diversity
  • Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?
    Specific Advantages:
    • Lots of tables with numbers for calculations (Approx. population in each primary habitats, total C contents in surface/subsurface prokaryotes and plants, turnover rates, simultaneous mutation rates, etc.)
    • Papers were logical for the most part when explaining their interpretations (Assumptions were also stated for the most part) Specific Challenges:
    • Some information regarding subsurface were only approximations or estimates (Data was derived at one site from only one study)
    • Some studies used logarithmic extrapolations rather than arithmetic averages (Could interfer with interpretation of results)
    • Grouping of prokaryotes into 3 primary habitats only (Ignoring other habitats such as in the air, leaves, animals, insects, etc.)

Evidence Worksheet_02 “Life and the Evolution of Earth’s Atmosphere”

Learning Objectives

Comment on the emergence of microbial life and the evolution of Earth systems

General Questions

  • Indicate the key events in the evolution of Earth systems at each approximate moment in the time series. If times need to be adjusted or added to the timeline to fully account for the development of Earth systems, please do so.

    • 4.6 billion years ago
      • Formation of solar system (Inner planets received water vapour and C)
    • 4.5 billion years ago
      • Formation of moon (Gave Earth spin and tilt, day-night cycle, and seasons)
    • 4.2 billion years ago
      • Meteorite Bombardment (Earth couldn’t have been a permanent habitation)
    • 3.8 billion years ago
      • Meteorite Bombardment Halted (Sea water chemistry stabilized)
      • Earliest sign that life began (Possibly existed via anoxygenic photosynthesis)
    • 3.75 billion years ago
      • Early Methanogenesis
    • 3.5 billion years ago
      • Presence of Rubisco (Implies global oxygenic photosynthesis)
      • Evolution of cyanobacteria
      • Presence of sulphate (Implies localized non-reducing conditions, but not necessarily the presence of O2)
    • 3.0 billion years ago
      • Glaciation
      • Evidence of photosynthesis (Stromatolites from Cyanobacteria)
      • Presence of Cyanobacteria –> Emergence of Eukaryotes
    • 2.7 billion years ago
      • Presence of hydrocarbon biomarkers and 2α-methylhopanes (Implies oxygenic photosynthesis from Cyanobacteria)
      • Presence of steranes (Implies presence of Eukaryotes)
    • 2.2 billion years ago
      • Glaciation
      • Large increase in oxygen level (Microaerobic Early Atmosphere –> Oxic Air)
      • Existence of redbeds (Implies oxidation)
      • Apperance of complex eukaryotes may be involved in sharp increase in O2 level
      • Cellular cybernetic switch between mitochondria and chloroplasts may control the link between photosynthesis, CO2, and N fixations
    • 2.1 billion years ago
      • Evoltuion of multicellular life
    • 1.3 billion years ago
      • Evolution of Eukaryotes
    • 550,000 years ago
      • Cambrian Explosion (Expansion of Multicellular Evolution)
      • Emergence of Land Plants (Increased oxygenation of atmosphere)
    • 400,000 years ago
      • Emergence of Animals
    • 200,000 years ago
      • Evolution of Homo sapiens
  • Describe the dominant physical and chemical characteristics of Earth systems at the following waypoints:

    • Hadean
      • Physical = Mainly glacial surface (100oC and lower due to CO2 cooling) with intervals of hot, molten surface from meteorite impacts (500oC)
      • Chemical = Heavy CO2 and N2 atmospere with H2 and water vapour
    • Archean
      • Physical = Heat flow was high due to radioactivity and decay (Powers plate tectonics and volcanism) followed by glaciation near the end of Eon
      • Chemical = Atmosphere lacked O2 and was mainly CH4:CO2 composition
    • Precambrian
      • Physical = Collection of Earth’s landmasses into a single supercontinent (Pangaea) with a number of glacial periods
      • Chemical = Atmosphere composition was mainly N2, CO2, and other inert gases (Atmosphere lacked O2 until emergence of photosynthetic life forms)
    • Proterozoic
      • Physical = High tectonic activity that led to the formation of mountains and glaciation period
      • Chemical = Sharp increase in O2 from Cyanobacteria’s oxygenic photosynthesis
    • Phanerozoic
      • Physical = Global temperatures warm enough to support complex life
      • Chemical = Increased oxygenation of atmosphere due to emergence of land plants

Evidence Worksheet_03 “The Anthropocene”

Learning Objectives

Evaluate human impacts on the ecology and biogeochemistry of Earth systems.

General Questions

  • What were the main questions being asked? Main Questions:
    • Did human intervention weaken our confidence in the uniformity of the course of nature?
    • Why shouldn’t other changes as extraordinary and unprecedented happen from time to time?
    • If a new cause was permitted to supervene, differing in kind and energy from any before in operation, why can’t others have come into action at different epochs?
    • How can the experience of one period be standard to which we can refer all natural phenomena of other periods?
  • What were the primary methodological approaches used? Addressing aspects of geobiology of Anthropocene to address whether:
    • Anthropocene is recognizable among other geological epochs
    • When did this begin and when will it end
    • What, among all the many features of the geobiological record of the Anthropocene, will be most recognizable millions of years in the future
  • Summarize the main results or findings.
    • Transition from Holocene to Anthropocene began at different times according to views:
      • Near the end of the 18th century (During the invention of the steam engine)
      • Began 7000 years ago with the development of agriculture due to deforestation (Hypothesis not supported by carbon cycle)
    • Anthropocene is recognizable among other geological epochs due to rapid population growth:
      • There might not be enough resources to sustain population in the future (Food supplies will need to be doubled by 2050)
      • Greenhouse gas emissions increase with population increase due to consumption of resources as prosperity rises (Use of fossil carbon, increase in meat diet that requires more grain and pastures to sustain, etc.)
      • Destruction of habitats from land use (Deforestation for timber, palm oil, pastures/croplands, ad human settlements)
      • Destruction of habitats in marine environments (Overfishing, pollution with extra nutrients/toxins, acidification, habitat destruction through trawling, and climate change)
      • Extra use of N/P/S affects major biogeochemical cycles (N/P for agriculture, and S for coal combustion)
      • Large contribution to carbon cycle from industrial activities (Leads to ocean acidification by lowering calcium carbonate saturation state in surface ocean and warming of climate due to burning of fossil fuels faster than uptake by sinks in ocean and terrestrial biosphere)
    • Future of Anthropocene:
      • Destruction of nature except for species/ecosystems that serve some human-centered purpose will ultimately drive the collapse of human society
      • Focus on ecosystem services could lead to conservation and change human behavior may help minimize human influence on environments (Reducing carbon dioxide through various processes and reduce solar radiation)
      • Human technology has reduced dependence on natural world
  • Do new questions arise from the results?
    • Will the global decline of biodiversity over the next many millennia ever comes close to the enormous loss of greater than 90% species at the end-Permian extinction?
    • How will the ecological responses to climate changes be coupled with all other stresses discussed above?
    • How will species extinctions reduce the resilience of the remaining communities?
  • Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?
    Sufficient information was provided for explaining key points. All key points were addressed in a logical manner (From beginning to human impacts to future prospects of humanity’s future). Did not really address the question “What, among all the many features of the geobiological record of the Anthropocene, will be most recognizable millions of years in the future” clearly, only stating what might happen in the near future.

Problem Set_01

Learning Objectives:

Describe the numerical abundance of microbial life in relation to the ecology and biogeochemistry of Earth systems.

Specific Questions:

  • What are the primary prokaryotic habitats on Earth and how do they vary with respect to their capacity to support life? Provide a breakdown of total cell abundance for each primary habitat from the tables provided in the text.

    Primary Prokaryotic Habitats on Earth:
    • Aquatic Environments = 1.2 * 1029 Cells
    • Subsurface = 3.8 * 1030 Cells
    • Soil = 2.6 * 1029 Cells

    How do they vary with respect to their capacity to support life:
    Aquatic environments have the highest rate of cellular productivity while subsurface environments have the lowest rate of cellular productivity between the 3 habitats (Even though they have the highest population).

  • What is the estimated prokaryotic cell abundance in the upper 200 m of the ocean and what fraction of this biomass is represented by marine cyanobacterium including Prochlorococcus? What is the significance of this ratio with respect to carbon cycling in the ocean and the atmospheric composition of the Earth?

    Estimated Prokaryotic Cell Abundance in Upper 200m of Ocean: 3.6 * 1028
    Fraction represented by marine cyanobacterium (+ Prochlorococcus): (4 * 104) / (5 * 105) * 100 = 8%
    Significance of this ratio with respect to C cycling in ocean and atmospheric composition of Earth:
    Approx. 8% of these prokaryotes (Cyanobacteria + Prochlorococcus) are contributing to the conversion of CO2 to O2

  • What is the difference between an autotroph, heterotroph, and a lithotroph based on information provided in the text?

    Difference Between Autotroph/Heterotroph/Lithotroph:
    • Autotroph = Photosynthetic, assimilate inorganic carbon (CO2 –> Biomass)
    • Heterotroph = Assimilate organic carbon
    • Lithotroph = Assimilate inorganic substrate

    Based on information provided in text.

  • Based on information provided in the text and your knowledge of geography what is the deepest habitat capable of supporting prokaryotic life? What is the primary limiting factor at this depth?

    Deepest Habitat: 4 km (Terrestrial) and 10.9 - 14.9 km (Marine)
    Primary Limiting Factor: Temperature (125oC)

  • Based on information provided in the text your knowledge of geography what is the highest habitat capable of supporting prokaryotic life? What is the primary limiting factor at this height?

    Highest Habitat: 77 km (In reality ~20 km above surface)
    Primary Limiting Factor(s): Stable Space/Resources/Radiation/Lack of Moisture

  • Based on estimates of prokaryotic habitat limitation, what is the vertical distance of the Earth’s biosphere measured in km?

    Vertical Distance of Earth’s Biosphere: ~24 - 44 km

  • How was annual cellular production of prokaryotes described in Table 7 column four determined? (Provide an example of the calculation)

    Annual Cellular Production of Prokaryotes:
    Population * (Turnover/Yr) = Cells/Yr
    3.6 * 1028 * 365 Days/16 Turnovers = 8.2 * 1029 Cells/Yr

  • What is the relationship between carbon content, carbon assimilation efficiency and turnover rates in the upper 200m of the ocean? Why does this vary with depth in the ocean and between terrestrial and marine habitats?

    Relationship between C content, C assimilation efficiency, and turnover rates in the upper 200m of ocean:
    Due to the high turnover rates in the upper 200m of ocean and the estimated low C assimilation efficiency (0.2), the C content will be low since the majority of C will be used to support the turnover of prokaryotes and not assimilated.
    This varies with depth in ocean and between terrestrial and marine habitats because as the depth increases, the turnover rate decreases due to low metabolic activity. This in turn leads to higher C contents since the turnover of prokaryotes in deeper depths becomes low enough for C to become assimilated.

  • How were the frequency numbers for four simultaneous mutations in shared genes determined for marine heterotrophs and marine autotrophs given an average mutation rate of 4 x 10-7 per DNA replication? (Provide an example of the calculation with units. Hint: cell and generation cancel out)

    Frequency Number for 4 Simultaneous Mutations in Shared Genes:
    Average Mutation Rate = 4 * 10-7 Per DNA Replication 365 / 16 = 22.8 Turnovers/Yr
    (4 * 10-7)4 = 2.56 * 10-26 Mutations/Generation
    3.6 * 1028 Cells * 22.8 = 8.2 * 1029 Cells/Yr * 2.56 * 10-26 Mutatations/Generation = 2.1 * 104 Mutations/Yr

  • Given the large population size and high mutation rate of prokaryotic cells, what are the implications with respect to genetic diversity and adaptive potential? Are point mutations the only way in which microbial genomes diversify and adapt?

    Implications:
    • Higher genetic diversity due to high mutation rate
    • Higher adaptive potential to environment due to high mutation rate (Natural selection of prokaryotic cells - Favours ones that contain mutation to help with survival)

    No: Point mutations are not the only way in which microbial genomes diversify and adapt. There can also be HGT between other bacteria, different levels of gene regulation/expression, insertions/deletions, etc.

  • What relationships can be inferred between prokaryotic abundance, diversity, and metabolic potential based on the information provided in the text?

    Relationships Between Prokaryotic Abundance, Diversity, and Metabolic Potential: High Prokaryotic Abundance <–> Higher Diversity <–> Higher Metabolic Potentials (More prokaryotes will lead to higher diversity via mutations and mutations could contribute to better genes that help with metabolism)

Problem Set_02

Learning Objectives:

Discuss the role of microbial diversity and formation of coupled metabolism in driving global biogeochemical cycles.

Specific Questions:

  • What are the primary geophysical and biogeochemical processes that create and sustain conditions for life on Earth? How do abiotic versus biotic processes vary with respect to matter and energy transformation and how are they interconnected?

    Primary Geophysical Processes = Tectonics and atmospheric photocehmical processes Primary Biogeochemical Processes = Microbially catalyzed, thermodynamically constrained redox reactions Abiotic vs. Biotic Processes vary with respect to matter/energy transformation and how are they interconnected: Abiotic processes usually supplies biotic processes with substrates (Biotic processes uses up energy to sustain life and uses up matter to produce “waste products” whereas abiotic processes creates energy via transformation of “waste products” to substrates usable by microbes)

  • Why is Earth’s redox state considered an emergent property?

    Earth’s Redox State = Emergent Property of Microbial Life on Planetary Scale
    • First 5 elements (C/H/N/O/S) driven largely by microbially catalyzed, thermodynamically constrained redox reactions
    • C/S/P dependent on tectonics (Volcanism/Rock Weathering)
    • Biogeochemical cycles evolved on planetary scale to form set of nested abiotically driven acid-base and biologically driven redox reactions (Sets lower limits on external energy required to sustain cycles)
    • Feedbacks between evolution of microbial metabolic geochemical processes create average redox conditions of oceans/atmosphere
    • Biological Oxidation = Driven by photosynthesis
  • How do reversible electron transfer reactions give rise to element and nutrient cycles at different ecological scales? What strategies do microbes use to overcome thermodynamic barriers to reversible electron flow?

    Reversible electron transfer reactions –> Element + Nutrient Cycles at different ecological scales? Steps:
    1. Energy of light from photosynthesis oxidizes electron donor
    2. The electrons + protons generated in the process are used to reduce inorganic C to organic matter with higher energy bonds
    3. Resulting oxidizing metabolites may serve as electron acceptors in aerobic or anaerobic respiration for photosynthetic organisms that use these “waste products” as oxidants
    4. Nutrients may be buried in sediments and returned to biosphere via mountain building and subsequent erosion/geothermal activity Strategies used by microbes to overcome thermodynamic barriers to reversible electron flow:
    • Reduction of CO2 with H2 (If H tension is sufficiently low, the reversible process becomes thermodynamically favorable). This may require the help of multispecies assemblages (H-Consuming sulfate reducers). Uses differences in concentration of substrates for overcoming thermodynamic barriers
    • Citric acid cycle oxidizes acetate stepwise into CO2. Breaking down elements in stepwise fashion uses less energy than breaking down elements directly into its most basic elements
    • Use of enzyme in converting N2 to NH4+. Enzymes reduces energy required to overcome thermodynamic barriers
    • Microbes work together in communities to overcome thermodynamic barriers. Uses products of other microbe population as substrates
  • Using information provided in the text, describe how the nitrogen cycle partitions between different redox “niches” and microbial groups. Is there a relationship between the nitrogen cycle and climate change?

    N Cycle
    • N Fixation transforms N2 to NH4+ for use in synthesis of proteins and nucleic acids (Requires mechanism for protecting enzyme from oxygen by spatially or temporally segregating nitrogen fixation from aerobic environments)
    • Oxidation of ammonia to hydroxylamine via ammonia monooxygenase Relationship between N cycle and climate change:
    • Syntheic N from fertilizers lead to excess N in N cycle
    • Excess N may lead to algae blooms (Nutrient enrichment and warm waters)
    • Excess N may affect denitrification process in freshwater (Influx of N > Efflux of N)
  • What is the relationship between microbial diversity and metabolic diversity and how does this relate to the discovery of new protein families from microbial community genomes?

    Relationship between microbial diversity and metabolic diversity: Higher microbial diversity leads to higher metabolic diversity since different microbes may have a better chance of survival under different conditions using a different resource for metabolism (High metabolic diversity) Relation to discovery of new protein families from microbial community genomes: Microbial communities with high diversity (In terms of species and metabolic diversity) will have a higher chance of finding new protein families due to mutations creating more efficient or functionally different proteins in microbes.

  • On what basis do the authors consider microbes the guardians of metabolism?

    Microbes = Guardians of Metabolism
    • Dispersal of core planetary gene set (VGT/HGT)
    • Selective pressure leads to evolution of boutique genes that protect metabolic pathway (Enables retention of fundamental redox processes using microbes as vessels)

Module 01 Writing

Module 01 References

Achenbach J. 2012. Spaceship Earth: A new view of environmentalism. The Washington Post. Link

Canfield DE, Glazer AN, Falkowski PG. 2010. The Evolution and Future of Earth’s Nitrogen Cycle. Science. 330:192-196. Link

Falkowski PG, et al. 2009. The Microbial Engines That Drive Earth’s Biogeochemical Cycles. Science. 320(5879):1034-1039. Link

Kasting JF, Siefert JL. 2002. Life and the Evolution of Earth’s Atmosphere. Science. 296:1066-1068. Link

Leopold A, Schwartz CW. 1949. A Sand Country Almanac: With Other Essays on Conservation from Round River. Enl. ed. N/A

Nisbet EG, Sleep NH. 2001. The habitat and nature of early life. Nature. 409(6823):1083-1091. N/A

Rockstr?m J, Steffen W, Noone K, Scheffer M, Teknik- och vetenskapshistoria (bytt namn 20120201), Skolan f?r arkitektur och samh?llsbyggnad (ABE), KTH, Filosofi och teknikhistoria. 2009. A safe operating space for humanity. Nature. 461(7263):472-475. N/A

Schrag DP. 2012. Geobiology of the Anthropocene. Fundamentals of Geobiology. Chapter 22. Link

Suddick EC, Whitney P, Townsend AR, and Davidson EA. 2013. The role of nitrogen in climate change and the impacts of nitrogen-climate interactions in the United States: foreword to thematic issue. Biogeochemistry. 114(3):1-10. Link

Whitman WB, Coleman DC, and Wiebe WJ. 1998. Prokaryotes: The Unseen Majority. Proc Natl Acad Sci USA. 95(12):6578-6583. PMC33863

Zehnder AJB. 1988. Biology of anaerobic microorganisms. Research in Microbiology. Chapter 1. Link

Module 02

Remapping the Body of the World

Evidence worksheet_04 “Bacterial Rhodopsin Gene Expression”

Learning objectives

Discuss the relationship between microbial community structure and metabolic diversity
Evaluate common methods for studying the diversity of microbial communities
Recognize basic design elements in metagenomic workflows

General Questions

  • What were the main questions being asked? Main Questions:
    • What is the physiological basis of light-activated growth stimulation?
    • What are the various specific functions and physiological roles of diverse marine microbial PRs
  • What were the primary methodological approaches used?
    • Screening fosmid library for in vivo PR photosystem expression
    • Genomic analysis of candidate PR photosystem-expressing clones
    • Genetic and phenotypic analysis of PR photosystem
    • Light-activated proton translocation
    • PR-driven proton translocation results in photophosphorylation in E. coli
  • Summarize the main results or findings.
    • Large-insert libraries increase probability of capturing complete metabolic pathways in single clone, but low copy number decreases sensitivity of detecting heterologous gene expression
    • Increasing fosmid copy number can significantly enhance detectable levels of recombinant gene expression and detection rate of desired phenotypes in metagenomic libraries
    • Only 6 genes required to enable light-activated proton translocation and photophosphorylation fully in a heterologous host (Necessary + Sufficient for complete synthesis + assembly of fully functional PR photoprotein)
    • Illumination of cells expressing native marine bacterial PR photosystem generates a proton-motive force that does indeed drive cellular ATP synthesis
    • PR-based phototrophy plays a significant role in planktonic marine microorganisms
  • Do new questions arise from the results?
    • Are there alternative approaches for detecting all PR-containing clones?
  • Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?
    • Diagrams were helpful in understanding graphs
    • Use of E. coli was logical in showing experimental results

Problem Set_03

Learning objectives:

Specific emphasis should be placed on the process used to find the answer. Be as comprehensive as possible e.g. provide URLs for web sources, literature citations, etc.

Specific Questions:

  • How many prokaryotic divisions have been described and how many have no cultured representatives (microbial dark matter)?

    • 89 Bacteria Phyla
    • 20 Archaeal Phyla
    • Up to 1500 MDM (Microbes that live in shadow biosphere)
  • How many metagenome sequencing projects are currently available in the public domain and what types of environments are they sourced from?

    • ~110K on EBI
    • Types of environments: Sediments/Soil/Gut/Aquatic/Etc.
  • What types of on-line resources are available for warehousing and/or analyzing environmental sequence information (provide names, URLS and applications)?

    Shotgun Metagenomics:
    • Assembly = EULER
    • Binning = S-GCOM
    • Annotation = KEGG
    • Analysis Pipelines = Megan 5
    • InG-/m
    • MG-RAGT
    • NCBI
    Marker Gene Metagenomics:
    • Standalone Software = OTUbase
    • Analysis Pipelines = SILVA
    • Databases = Ribosomal Database Project (RDP)
    • Denoising = Amplicon Noise
  • What is the difference between phylogenetic and functional gene anchors and how can they be used in metagenome analysis?

    Phylogenetic:
    • Vertical gene transfer
    • Carry phylogenetic information (Allows tree reconstruction)
    • Taxonomic
    • Ideally single-copy
    Functional:
    • More horizontal gene transfer
    • ID specific biogeochemical functions associated with measurable effects
    • Not as useful for phylogeny
  • What is metagenomic sequence binning? What types of algorithmic approaches are used to produce sequence bins? What are some risks and opportunities associated with using sequence bins for metabolic reconstruction of uncultivated microorganisms?

    Types of Algorithms:
    1. Align sequences to database
    2. Group to each other based on DNA characteristics (GC Content, Codon Usage, etc.)
    Risks/Opportunities:
    • Incomplete coverage of genome sequence
    • Contamination from different phylogeny
  • Is there an alternative to metagenomic shotgun sequencing that can be used to access the metabolic potential of uncultivated microorganisms? What are some risks and opportunities associated with this alternative?

    • Single Cell Sequencing
    • Enrichment Culturing
    • Functional Screens (Biochemical, etc.)
    • 3rd Gen Sequencing (Nanopore)
    • FISH

Module 02 References

Madsen EL. 2005. Identifying microorganisms responsible for ecologically significant biogeochemical processes. Nature Reviews Microbiology. 3:439-446. PMID15864265

Martinez A, Bradley AS, Waldbauer JR, Summons RE, DeLong EF. 2007. Proteorhodopsin Photosystem Gene Expression Enables Photophosphorylation in a Heterologous Host. Proceedings of the National Academy of Sciences of the United States of America. 104:5590-5595. Link

Wooley J, Godzik A, Friedberg I. 2010. A Primer on Metagenomics. PLOS COMPUTATIONAL BIOLOGY. 6:e1000667. Link

Module 03

Evidence worksheet_05 “Extensive mosaic structure of uropathogenic E. coli”

Part 1: Learning Objectives

Evaluate the concept of microbial species based on environmental surveys and cultivation studies
Explain the relationship between microdiversity, genomic diversity and metabolic potential
Comment on the forces mediating divergence and cohesion in natural microbial communities

General Questions:

  • What were the main questions being asked? Main Questions:
    • Why do some E. coli strains live as harmless commensals in animal intestines while other distinct genotypes cause significant morbidity and mortality as human intestinal pathogens?
    • How do we understand the genetic bases for pathogenicity and evolutionary diversity of E. coli based on comparisons of different pathogenic strains and non-pathogenic strain of E. coli?
  • What were the primary methodological approaches used?
    Cloning and sequencing of pathogenic strain isolates with whole genome libraries prepared from genomic DNA with random clones sequenced via dye-terminator chemistry and data collected on sequencers. Sequence data were assembled by SEQMANII and sequencing of opposite ends of linking clones were done via several PCR-based techniques and primer walking.
    End result: Whole-genome XhoI optical map with ordering of contigs and confirmation of contig structure during assembly process and independent physical map of whole genome.
    Sequence analysis and annotation with MAGPIE, GIMMER for define ORFs, and predicted proteins searched against non-redundant database by using BLAST were also used.

  • Summarize the main results or findings.
    • Even tough virulence plasmids are common to many E. coli, they are not usually associated with uropathogenic strains and none were found in CFT073
    • Pathogenic strains had larger genome compared to non-pathogenic strains due to many insertions of unique segments that encode either known or potential virulence genes
    • More bias for rare codons in island ORFs than backbone
    • Variation among E. coli uropathogenic strains due to different niches. Findings suggest that introduction of high pathogenicity island may have been one of the earliest events in evolution of extraintestinal E. coli
    • Fimbriae for pathogenesis are common in both pathogenic and non-pathogenic strains, but pathogenic strains have highly divergent proteins compared to MG1655 and EDL933 (Suggests selective pressure on expression of pilus has varied among E. coli lineages due to specificity of adhesin to individual target tissue)
    • Encoding of putative autotransporters confers virulence in CFT073 similar to other pathogenic bacteria such as and enteroaggregative E. coli and Shigella flexneri
    • Hemolysin genes at pheV island encode cytolytic toxin and secretion apparatus, but when compared with originally characterized RTX determinants they are atypical (B/D secretion genes preceding A gene and lack of C-like gene that typically encodes a fatty acid modification enzyme). These differences in genes suggest that this locus encodes a unique class of RTX-like secreted protein
  • Do new questions arise from the results?
    • Why aren’t virulence plasmids usually associated with uropathogenic strains even though they are common to many E. coli isolates?
    • Why were the codon usage pattern in EDL933 backboune ORFs indistinguishable from CFT073 backbone in the same test?
    • What are the unknown functions of the island genes shared by EDL933 and CFT073 (Or are associated with phage or insertion sequence elements)?
    • What is this unique class of RTX-like secreted protein encoded by the locus in CFT073?
    • Is there a way to assess the remove of genes that were detrimental to uropathogenic lifestyle?
  • Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?

    Sufficient background information was provided and the explanations/assumptions were logical, but the amount of information can be overwhelming for people in this paper (Even for a person who has some background in bacterial pathogenesis, microbiology, and immunology). Figure 3 was also helpful in understanding the comparisons between pathogenic strains of E. coli.

Part 2: Learning Objectives

Comment on the creative tension between gene loss, duplication and acquisition as it relates to microbial genome evolution
Identify common molecular signatures used to infer genomic identity and cohesion
Differentiate between mobile elements and different modes of gene transfer

Based on your reading and discussion notes, explain the meaning and content of the following figure derived from the comparative genomic analysis of three E. coli genomes by Welch et al. Remember that CFT073 is a uropathogenic strain and that EDL933 is an enterohemorrhagic strain. Explain how this study relates to your understanding of ecotype diversity. Provide a definition of ecotype in the context of the human body. Explain why certain subsets of genes in CFT073 provide adaptive traits under your ecological model and speculate on their mode of vertical descent or gene transfer.

  • Meaning and Content of Figure:
    Figure is comparison between location and sizes of CFT073 and EDL933 islands. Vertical axis indicates island size and horizontal axis indicates their position in colinear backbone.

  • Explain how study relates to understanding of ecotype diversity:
    This study relates to our understanding of ecotype diversity since CFT073 and EDL933 are the same species, but different strains due to their ecosystem. Different ecosystems have different conditions for survival and will require different sets of genes (CFT073 was isolated from the blood of patient while EDL933 was most likely isolated from feces samples).

  • Provide definition of ecotype in context of human body:
    Ecotype is a distinct form or race of species occupying a particular habitat. They are equivalent to strains (Uropathogenic and Enterohemorrhagic in this case).

  • Explain why certain subsets of genes in CFT073 provide adaptive traits under ecological model and speculate on their mode of vertical descent or gene transfer:
    Pathogenic traits that are encoded in islands are transferred horizontally while ancestral backbone traits (Genes that define them as a species) are transferred vertically. Usually larger genes are harder to transfer than smaller genes (Higher genes tend to be lost in transfers).

Problem set_04

#To make tables
library(kableExtra)
library(knitr)
#To manipulate and plot data
library(tidyverse)
#R Calculations
library(vegan)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.4-6
# Part 1
Sample_Collection = data.frame(
  Num = c(1:29),
  Name = c("Skittles", "Skittles", "Skittles", "Skittles", "Skittles", "Gummy Bears", "Gummy Bears", "Gummy Bears", "Gummy Bears", "Gummy Bears", "M&M", "M&M", "M&M", "M&M", "M&M", "M&M", "Gummy Balls", "Gummy Balls", "Gummy Balls", "Mutated Candy", "Kisses", "Sour Bears", "Spiders", "Gummy Rods", "Gummy Rods", "Gummy Rods", "Gummy Rods", "Gummy Rods", "Long Gummy Rods"),
  characteristics = c("Red", "Green", "Brown", "Yellow", "Orange", "Green", "Pink", "Orange", "Red", "Yellow", "Red", "Orange", "Brown", "Green", "Yellow", "Blue", "Orange", "Green", "Yellow", "Red", "Wrapper", "Yellow", "Multi-Colour", "Orange", "Pink", "Green", "Yellow", "Red", "Pink"),
  Occurences = c(9,7,10,6,6,3,1,3,2,1,3,16,8,5,8,13,1,1,2,1,4,1,1,4,8,7,4,6,1))

# Questions
# 1) Refer to Sample_Collection for table
# 2) Collection of microbial cells from seawater doesn't really represent the actual diversity of microorganisms inhabiting waters along Line-P transect. Some different species were missed

# Part 2
Sample_Curve = data.frame(
  x = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29),
  y = c(1,2,3,3,4,5,6,6,6,7,8,8,8,8,8,8,9,9,9,9,9,9,9,9,9,9,9,9,10)
)

ggplot(Sample_Curve, aes(x=x, y=y)) +
  geom_point() +
  geom_smooth() +
  labs(x="Cumulative number of individuals classified", y="Cumulative number of species observed")
## `geom_smooth()` using method = 'loess'

# Questions
# 1) Refer to Sample_Curve for collector's curve for sample
# 2) Curve almost flattens out at approx. 20 individuals
# 3) The shape of collector's curve showed that the depth of sampling is apparently enough to obtain majority of species in original community

# Part 3
# DIVERSITY: SIMPSON RECIPROCAL INDEX FOR COLLECTION = 3.855449
S1 = 4/142
S2 = 10/142
S3 = 29/142
S4 = 4/142
S5 = 53/142
S6 = 1/142
S7 = 1/142
S8 = 38/142
S9 = 1/142
S10 = 1/142

SRI = 1/(S1^2 + S2^2 + S3^2 + S4^2 + S5^2 + S6^2 + S7^2 + S8^2 + S9^2 + S10^2)

SRI_Comm = 1/0.20521

# RICHNESS: CHAO1 RICHNESS ESTIMATOR FOR COLLECTION = 11.333333
SCHAO1 = 10 + ((4^2)/(2*6))

SCHAO1_Comm = 17 + ((0^2)/(2*17))

# Questions:
# 1) SRI For Sample = 3.855
# 2) SRI for Original Total Community = 4.873
# 3) Chao1 Estimate For Sample = 11.333
# 4) Chao1 Estimate For Original Total Community = 17

# Part 4
Fixed_Diversity_Data = data.frame(
  Name = c("Skittles", "Gummy Bears", "M&M", "Gummy Balls", "Mutated Candy", "Kisses", "Sour Gummy Bears", "Spiders", "Gummy Rods", "Long Gummy Rods"), 
  Occurences = c(38,10,53,4,1,4,1,1,29,1))

Diversity_Data = 
  Fixed_Diversity_Data %>% 
  select(Name, Occurences) %>% 
  spread(Name, Occurences)

Diversity_Data
##   Gummy Balls Gummy Bears Gummy Rods Kisses Long Gummy Rods M&M
## 1           4          10         29      4               1  53
##   Mutated Candy Skittles Sour Gummy Bears Spiders
## 1             1       38                1       1
Real_SRI = diversity(Diversity_Data, index="invsimpson")

Real_SCHAO1 = specpool(Diversity_Data)

Community_Data = data.frame(
  Name = c("Skittles", "Gummy Bears", "M&M", "Gummy Balls", "Mutated Candy", "Kisses", "Sour Gummy Bears", "Spiders", "Gummy Rods", "Large Gummy", "Sour Gummy Swirls", "Gummy Cokes", "Curly Gummy Lines", "Gummy Fruit", "Twizzlers", "Small Bricks", "Large Bricks"), 
  Occurences = c(192,102,221,24,2,16,3,6,173,2,3,3,7,2,14,15,3))

Community_Diversity_Data = 
  Community_Data %>% 
  select(Name, Occurences) %>% 
  spread(Name, Occurences)

Community_Diversity_Data
##   Curly Gummy Lines Gummy Balls Gummy Bears Gummy Cokes Gummy Fruit
## 1                 7          24         102           3           2
##   Gummy Rods Kisses Large Bricks Large Gummy M&M Mutated Candy Skittles
## 1        173     16            3           2 221             2      192
##   Small Bricks Sour Gummy Bears Sour Gummy Swirls Spiders Twizzlers
## 1           15                3                 3       6        14
Real_Comm_SRI = diversity(Community_Diversity_Data, index="invsimpson")

Real_Comm_SCHAO1 = specpool(Community_Diversity_Data)

# Questions:
# 1) R SRI For Sample = 3.855
# 2) R SRI for Original Total Community = 4.873
# 3) R Chao1 Estimate For Sample = 10
# 4) R Chao1 Estimate For Original Total Community = 17

Module 03 Writing

Module 03 References

Callahan BJ, Mcmurdie PJ, Holmes SP. 2017. Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. The ISME Journal. 11:2639. Link

Gaudet AD, Ramer LM, Nakonechny J, Cragg JJ, Ramer MS. 2010. Small-group learning in an upper-level university biology class enhances academic performance and student attitudes toward group work. PloS one. 5:e15821. Link

Hallam SJ, Torres-beltrán M, Hawley AK. 2017. Monitoring microbial responses to ocean deoxygenation in a model oxygen minimum zone. Scientific Data. 4:170158. Link

Hawley AK, Torres-beltrán M, Zaikova E, et al. 2017. A compendium of multi-omic sequence information from the Saanich Inlet water column. Scientific Data. 4:170160. Link

Kunin V, Engelbrektson A, Ochman H, Hugenholtz P. 2010. Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environmental microbiology. 12:118-123. Link

Sogin ML, Morrison HG, Huber JA, et al. 2006. Microbial Diversity in the Deep Sea and the Underexplored “Rare Biosphere”. Proceedings of the National Academy of Sciences of the United States of America. 103:12115-12120. Link

Torres-Beltrán M, Hawley AK, Capelle D, et al. 2017. A compendium of geochemical information from the Saanich Inlet water column. Scientific Data. 4:170159. Link

Welch RA, Burland V, Plunkett G, et al. 2002. Extensive Mosaic Structure Revealed by the Complete Genome Sequence of Uropathogenic Escherichia coli. Proceedings of the National Academy of Sciences of the United States of America. 99:17020-17024. Link

Project 1

Project 2